Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs
نویسندگان
چکیده
In response to the constant increase in wire delays, Non-Uniform Cache Architecture (NUCA) has been introduced as an effective memory model for dealing with growing memory latencies. This architecture divides a large memory cache into smaller banks that can be accessed independently. Banks close to the cache controller therefore have a faster response time than banks located farther away from it. In this paper, we propose and analyse the insertion of an additional bank into the NUCA cache. This is called Last Bank. This extra bank deals with data blocks that have been evicted from the other banks in the NUCA cache. Furthermore, we analyse the behaviour of the cache line replacements done in the NUCA cache and propose two optimisations of Last Bank that provide significant performance benefits without incurring unaffordable implementation costs.
منابع مشابه
A Compile-Time Data Locality Optimization Framework for NUCA Chip Multiprocessors
With increasing numbers of cores, future CMPs (Chip MultiProcessors) are likely to have a tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved distribution of the address space. For data-parallel programming models, there is a mismatch between such a non-uniform cache organization and the canonical row-major or column-major layouts of multi-dimensional arrays...
متن کامل3D Tree Cache – A Novel Approach to Non- Uniform Access Latency Cache Architectures for 3D CMPs
We consider a non-uniform access latency cache architecture (NUCA) design for 3D chip multiprocessors (CMPs) where cache structures are divided into small banks interconnected by a network-on-chip (NoC). In earlier NUCA designs, data is placed in banks either statically (S-NUCA) or dynamically (D-NUCA). In both SNUCA and D-NUCA designs, scaling to hundreds of cores can pose several challenges. ...
متن کاملBP-NUCA: Cache Pressure-Aware Migration for High-Performance Caching in CMPs
As the momentum behind Chip Multi-Processors (CMPs) continues to grow, Last Level Cache (LLC) management becomes a crucial issue to CMPs because off-chip accesses often involve a big latency. Private cache design is distinguished by smaller local access latency, good performance isolation and easy scalability, thus is becoming an attractive design alternative for LLC of CMPs. This paper propose...
متن کاملA Daptive Block Pinning Based : D Ynamic C Ache Partitioning for M Ulti - Core Architectures
This paper is aimed at exploring the various techniques currently used for partitioning last level (L2/L3) caches in multicore architectures, identifying their strengths and weaknesses and thereby proposing a novel partitioning scheme known as Adaptive Block Pinning which would result in a better utilization of the cache resources in CMPs. The widening speed gap between processors and memory al...
متن کاملCritique for paper “ Optimizing Replication , Communication , and Capacity Allocation in CMPs ”
The paper presented an approach for extracting benefits of both unified cache and distributed cache approaches using Non Uniform cache architecture. ● The NUCA approach enables us to retain as many memory references on chip as possible, thus greatly reducing penalty due to off chip accesses ● The approach utilizes distance locality, thereby data is stored as close to the processor core as possi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009